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^ ; Abstract 
> 

Q ■ This paper resolves a common complexity issue in the Bethe approximation of statistical 

I physics and the sum-product Belief Propagation (BP) algorithm of artificial intelligence. The 

Bethe approximation reduces the problem of computing the partition function in a graphical 
I model to that of solving a set of non-linear equations, so-called the Bethe equation. On 

the other hand, the BP algorithm is a popular heuristic method for estimating marginal 
distributions in a graphical model. Although they are inspired and developed from different 
directions, Yedidia, Freeman and Weiss (2004) established a close connection: the BP algo- 
rithm solves the Bethe equation if it converges (however, it often does not). This naturally 
. motivates the following important question to understand their limitations and empirical 

successes: the Bethe equation is computationally easy to solve? 

We present a message passing algorithm solving the Bethe equation in polynomial num- 
' ber of operations for arbitrary binary graphical models of n variables where the maximum 

degree in the underlying graph is O(logn). Our algorithm, an alternative to BP fixing its 
' convergence issue, is the first fully polynomial-time approximation scheme for the BP fixed 

. point computation in such a large class of graphical models, while the approximate fixed 

point computation is known to be (PPAD-)hard in general. We believe that our technique 
. is of broader interest to understand the computational complexity of the cavity method in 

1^*) ' statistical physics. 



X 



1 Introduction 



In the recent years, graphical models (also known as Markov random fields) defined on graphs 
have been studied as powerful formalisms modeling inference problems in numerous areas includ- 
ing computer vision, speech recognition, error-correcting codes, protein structure, networking, 
statistical physics, game theory and combinatorial optimization. Two central problems, com- 
monly addressed in these applications involving graphical models, are computing the marginal 
distribution and the so-called partition function. It is well-known that the inference problems 
are computationally hard in general [5]. Due to such a theoretical barrier, efforts have been 
made to develop heuristic methods. 

The sum-product Belief Propagation (BP) algorithm (first proposed by Pearl [19]) and its 
variants (e.g. Survey Propagation) are such heuristics, driven by certain experimental thoughts, 
for computing the marginal distribution. Their appeals lie in the ease of implementation as well 
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as optimality in tree-structured graphical models (models which contain no cycles). BP (and 
message-passing algorithms in general) can be thought as an updating rule on a set of messages: 

m*+^ = / (m*) , 

where m* is the multi-dimensional vector of messages at the t-th iteration, and / describes 
the updating rule (or BP operator). Two major hurdles to understand such a message-passing 
algorithm are about its convergence (i.e. m* converges to m*?) and correctness (i.e. m* is good 
enough?). It is known that the BP iterative procedure always has a fixed point m* due to the 
Brouwer fixed point theorem. However, BP can oscillate far from a fixed point in models with 
cycles, and only several sufficient convergence conditions [25, 22, 14, 15] have been established 
in the last decade. More importantly, BP can have multiple fixed points, and even when it is 
unique, it may not be the correct answer. Significant efforts [14, 24, 26] were made to understand 
BP fixed points, while the precise approximation qualities and the rigorous understandings on 
their limitations still remain mystery. Regardless of those theoretical understandings, the BP 
algorithm performs empirically well in many applications [12, 18, 10]. For example, the highly 
successful turbo codes [.'!] in practice can be interpreted as BP [17] and decisions guided by BP 
is also known to work well to solve satisfiability problems [20] . 

The Bethe approximation [2, 26] and its variants (e.g. Kikuchi approximation [9]), originally 
developed in statistical physics of lattice models, are currently used as powerful approximation 
schemes for computing the (logarithm of) partition function in many applications. The Bethe 
approximation suggests to use the following quantity as an approximation for the logarithm of 
the partition function: 

F{y*) where VF(y*) = 0. 

F, VF(y*) = and y* are called the (minus) Bethe free energy function, Bethe equation 
and Bethe equilibrium, respectively. The statistical physics prediction suggests its asymptotic 
correctness in random sparse graphical models, and several rigorous evidences in particular 
models are known [1, 8, 4]. Efforts have also been made to estimate and characterize its error 
[6, 21]. However, the error still remains uncontrollable for models with many cycles. 

Yedidia, Freeman and Weiss [26] established a somewhat surprising connection between the 
BP algorithm and the Bethe approximation: if BP converges, it solves the Bethe equation. 
Equivalently, the BP fixed point equation f{m*) = m* is in essence equivalent to the Bethe 
equation V-F(y*) = 0. This naturally leads to the following common computational question 
for both: the BP fixed point computation is computationally easy? Formally speaking, 

Q. Given e > 0, is it possible to design a deterministic iterative algorithm finding m* satisfy- 
ing"*^ 

(l-e)/(m*) < m* < (l + e)/(m*), 
in polynomial number of bitwise operations with respect to 1/e and the dimension of vector 

Such an algorithm can be used an alternative to BP with provably fast convergence rate (i.e. 
fixing the convergence issue of BP) and eliminates a need for the convergence analysis of BP. 
Even though it may not converge to the correct answer, it can, at least, provide a guidance 

^Note that it is impossible to compute the exact solution m* with /(m*) = m* since it is irrational in general. 
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toward it [20]. Further, it confirms the efficiency of the Bethe approximation scheme as well 
since m* satisfying the above inequality provides y* with ||VF(y*)|| < e. Efforts to design such 
algorithms were made [23, 27], but no rigorous analysis on their convergence rates is known. 
The authors in [4] provide an algorithm with provable polynomial convergence rate, but the 
work is for a very specific graphical model (i.e. the uniform distributions on independent sets 
of sparse graphs). It is far from being clear whether such a poly-convergence algorithm exists 
for general graphical models. This is primarily because computing a fixed point and a local 
minimum approximately are not known to be believably (PPAD- or PLS-)hard in general [7]. 

1.1 Our Contribution 

The main result of this paper is the following answer A (cf. Theorem 2 in Section 3) for the 
question Q for the BP operator / and arbitrary sparse binary graphical models. To state it 
formally, we let n be the number of nodes and A be the maximum degree in the underlying 
graph, respectively. 

A. Given e > 0, there exists a deterministic iterative algorithm finding m* satisfying 

{l-e)f{m*) < m* < {l + e)f{m*) 

in 2'^(^)n^e~^ log^(n£~^) iterations. 

In this paper, we call the message m* satisfying the above inequality as an e-approximate BP 
fixed point. In what follows, we explain the algorithm in details. 

The known equivalence [2G] between the BP fixed point equation and the Bethe equation 
implies that the question Q is equivalent to the following. 

Q' . Given e > 0, is it possible to design a deterministic iterative algorithm finding y* satisfying 

l|VF(y*)|| < e, 

in polynomial number of bitwise operations with respect to 1/e and the dimension of the 
domain D of the Bethe free energy function F? 

However, we remind the reader that it is still far from being obvious whether it is computationally 
'easy' to find such a near stationary point or an approximate local minimum (or maximum). 
Natural attempts are gradient-descent algorithms to find a local minimum or maximum of F: 
iteratively update y(i) as 

y(t + l) = y(t)+aVF(y(t)), 

where a G M is the (appropriately chosen) step-size. The main issue here is that the gradient- 
descent algorithm may not find a near stationary point if y(t) hits the boundary of D in one of 
its iterations (and projection is required). Hence, the main strategy in [4] to avoid the hitting 
issue lies in (a) understanding the behavior of gradient V-F close to the boundary of D and 
(b) designing an appropriate small step-size in the gradient-descent algorithm based on the 
understanding (a). 

The main technical challenge to apply the strategy to general binary graphical model (beyond 
the specific model in [4]) is on (a). The domain D is simply [O, |] in [4] since the Bethe free 
energy function F is determined by node marginal probabilities in the uniform independent-set 
model. One can observe that the proof strategy in [1] is very sensitive and immediately fails even 
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for the non-uniform independent-set model, which has the domain D = [0,1]". Furthermore, 
the more significant issue is that in general graphical model the domain complexity becomes 
larger, i.e. D = [0, 1]'"+™ where m is the number of edges in the underlying graph. This is 
because the Bethe free energy should consider pairwise (or edge) marginal probabilities as well. 
One can check that any similar approaches with [4] fail in the larger domain D = [0, 1]"-+"^. To 
overcome such a technical issue, we first observe that at stationary points of F, pairwise marginal 
probabilities should satisfy certain quadratic equations in terms of node marginal probabilities 
in binary graphical models. This allows to express the Bethe free energy again in terms of node 
marginal probabilities i.e. D = [0, 1]". Now we study this 'modified' Bethe expression F* to 
avoid the hitting issue, which we end up with an appropriate small step-size in the gradient- 
descent algorithm. Moreover, we eliminate a need to decide such a small step-size explicitly in 
the algorithm, by designing an elegant time-varying projection scheme. 

We later realize that the 'modified' Bethe expression F* was already proposed by Teh and 
Welling [23], where they suggested gradient algorithms to minimize F* using sigmoid functions. 
The main difference in our work is that we study the behavior of gradient VF* close to the 
boundary of its domain and guarantee that the gradient-descent algorithm does not hit the 
boundary without sigmoid functions. The success of our rigorous convergence rate analysis, 
which was missing in the work of Teh and Welling (2001), primarily relies on this difference. It 
is also crucial to extend the algorithm design to non-binary graphical models as we describe in 
Section 4. 

One can observe that our gradient-descent algorithm is implementable as a 'BP-like' iterative, 
message passing algorithm: each node maintains a message at each iteration and passes it to 
its neighbors. We prove it terminates in 2*^(^)n^e"^ log'^(ne~^) iterations for binary graphical 
models until it finds an e-approximate BP fixed point. In a complexity point of view, the only 
remaining issue is that each node may require to maintain irrational messages (of infinitely 
long bits). We further show that a polynomial number (with respect to 1/e, n and 2^) of bits 
to approximate each message suffices, and hence the algorithm consists of only a polynomial 
number of bitwise operations in total. Namely, it is a fully polynomial-time approximation 
scheme (FPTAS) to compute an approximate BP fixed point for sparse binary graphical models 
where A = O(logn). 

1.2 Organization 

In Section 2, we provide backgrounds for graphical models, Belief Propagation and Bethe approx- 
imation. In Section 3, we describe our algorithm and its time complexity for binary graphical 
models. In Section 4, we discuss how to extend the result to non-binary graphical models at a 
high level. From our discussion in Section 4, one can observe that it is not hard to obtain the 
similar convergence rate result as well. But, we omit the further details in this paper. 

2 Graphical Models 

We first introduce a class of joint distributions defined with respect to (undirected) graphs, 
which are called (pairwise) Markov random fields (MRFs) [16]. Specifically, let G = {V,E) be 
a undirected graph with the vertices being denoted by V with \V\ = n, and the edges E C 
denoting a set of unordered pairs of vertices. The vertices of G label a collection of random 
variables x = {x^ \ v £ V}. Our primary focus in this paper is on binary random variables, i.e., 
Xi, G {0, 1} for ah v gV. 
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Now consider the following joint distribution on {0, 1}" that factors according to G: 

P{^) = Yi IpviXv) Yi '^u,v{Xu,Xy) for xG{0, 1}". 

veV (u,v)£E 

Here, each ip^^v and ip^ are non- negative functions on {0, 1}^ and {0, 1}, respectively. These 
local functions are called potential functions or compatibility functions. The normalizing factor 
Z is called the partition function: 

Z= ^ '4'u,v{xu,Xv). (1) 

x6{o,i}"t)ey {u,v)eE 

Finally, some notations. Let J\f{v) be the set of neighbors of a vertex v £ V, dy := \M{v)\ 
be the degree v £ V, and A := max^ be the maximum degree in the graph G. Further, we 
define 

V^, := max /elln^„(x„)|^g|ln^„,,(x„,x„)|1 

{u,v)£E, Xu,Xy£{0,l} I J 

In this paper, we primarily focus on the case tp* = 0(1).^ 
2.1 Belief Propagation 

The BP algorithm has messages {m^^^(-), := {m^_^^(x^), m*_^„(x„) : {u,v) G E,Xy,Xu G 

{0, 1}} at the i-th iteration are on the both sides of edges and it updates them as 

x„6{0,l} weAf{u)\v 

where X]x„g{o i} ^t^vi^v) = 1- This is equivalent to the following updating rule on (reduced) 
messages {m^^^,m*^„}. 



where ml^^^ := m^_^,y(l)/m^_^^,(0) and the function fu^v ■ ^+ ^+ is defined as 

. , ^ i^uA^^ 1)-0»(O) + i^uA'^^ • X 

^^"^^^ ■ ^„,„(0, 0)^,(0) + 0)V^„(1) -x' 

Now the BP fixed point of messages {771^—^^), tti^^—^u} can be naturally defined as 

\w<^M{u)\v 

where one can easily argue the existence of such a fixed point using the Brouwer fixed point 
theorem. This motivates the following notion of e- approximate BP fixed point. 



^This excludes the case tjiu,v{-, ■) = 0. However, we note that our algorithm and its analysis still work even for 
the case ipu,v[-, 0=0 such as the independent set model in [1]. 
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Definition 1 The set of messages {mu^^^m^^u '■ iu,v) £ E} is called an e- approximate BP 
fixed point if 



fu^v [JAw^M (u)\v ''^w^u 



< e, V {u,v) G E. 



(2) 



The BP estimates for node and edge marginal probabilities based on messages, denoted by 
Tv{-),Tu,v{-) for V £ V, {u,v) £ E, are defined as 

r„(x^) oc ^pv{xy) mu^y{xy) (3) 

w£N{u)\v weAf{v)\u 

where Tvixy) = 1 and t„(x„) = Y,^^ Tu,v{xu,Xy). 
2.2 Bethe Approximation 

The Bethe approximation [20] is an approximation to the logarithm of the partition function 
(i.e. InZ), given by 



^^Ty{Xy) [lnV'^(x„) - lnT^(x„)] + X] Tu,v{Xu,X^ 



\n'ilJu,vixu,Xy) - In 



Tu{Xu)Ty(^Xy^ 



Under the constraints X^^,^ t„(xi,) = 1 and Ty{xy) = X]a;„ "^".^ (^"' )' reduced as a 

function of y = [yv,yu,v] where yy = r^(l) and yu,v = Tn,„(l, 1). 



'^-yu-yv + yu,- 



v&V 



{u,v)eE 



(1 - - yv + yu,v) |^ln'0^,,^(o, o) - in 

) lnV'M,i,(l,0) - In 



(1 - y„)(l - yy) 

yu yu,v 



- yv) 



+{yv - yu,v) i Inipu yiO, 1) - in-^^^^^ — 

V (1 - yu)yv 



+yu,v ln'ipu,v{'i-, 1) - In 



yu,v 
yuyv 



where —F is called the Bethe free energy function ['^'']. The gradient V-F(y) 
be obtained as 



dF dF 



dF 

dyv 

dF 

dyu,v 



^(-)+lni:i^+ ^ In 



yv 



1 yv yu ~i~ y?j 



1 - yv 



+ In 



u(^J\f{v) 

yu yu,v yv yu,v 



yv yu,v 



f yu yv ~i~ yu,v yu,v 



(5) 

can 

(6) 
(7) 
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where 



(.) := In ^ + y In and ^p(-'^^ := In ^U^^^)^^Ahl) 

It is known that there is one-to-one correspondence between BP fixed points and zero gradient 
points of F. In particular, one can obtain the fohowing lemma, where its proof can be done easily 
using the algebraic expressions (6) and (7) of gradients. The proof is presented in Appendix A. 

Lemma 1 Given e G [0,1), suppose y = [yv,yu,v] satisfies ||VF(y)||oo < £■ Then, the set of 
messages {iriu^v , itT'v^u ■ iu,v) G E} is a 6£- approximate BP fixed point if it is given as 

ipu,v{OA) i-yv-yu + yu,v yv 



V'«,-u(o, 0) i-yv yv-yu,v' 



3 Algorithm for Computing BP Fixed Points 

In this section, we present the main result of this paper, a new message passing algorithm for 
approximating a BP fixed point. Prom the (algebraic) relationship between approximate BP 
fixed points and near stationary points of the Bethe free energy function F in Lemma 1, it is 
equivalent to compute a near stationary point y i.e. ||VF(y)||2 < £■ 

Our algorithm, described next, for finding such a point is essentially motivated by the stan- 
dard (projected) gradient-descent algorithm. The non-triviality (and novelty) lies in our choice 
of appropriate (time-varying) 'projection [-J^,' with respect to the (time-varying) 'step-size 
at each iteration and subsequent analysis of rate of convergence. 

Algorithm A 



1. Algorithm parameters: 

e G (0, 1) and y(t) = [yy{t) G (0,1) : v £ V] at the t-th iteration. 

2. y(t) is updated as: 



yv{t + i) 



+ ym 



- yv{t) - yu{t) + yu,v{t) yv{t) 



i-yv{t) 

where the projection [•]* at the t-th iteration is defined as 

\f ^_ < r < 1 h- 



yv{t) - yu,v{t) 



X 

1 

(1/4 

1 — -jjj4 if X > 1 — -jjjj 



if x < ^ 



and yu,v{t) > is computed as the unique solution satisfying 



yu(t) - yu,v{t) Vvit) - yu,v{^) 



'i- - yu{t) - yv{t) + yu,v{t) yu,v{t) 



1 and 2/„,„(t) < min{y„(t),?/u(t)}. 
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3. Compute messages {niu^v , "m-v^u} as 

^ _ ipu,v{0, 1) 1 - yvjt) - yu{t) + yu,v{t) yyjt) 
"^'^ tpu,v{0, 0) 1 - yv{t) yvit) - yu,v{t) ' 

4. Terminate if {m„_>.i,, is an e-approximate BP fixed point. 



The algorithm is clearly implementable through message-passing where each node u sends 
yu{t) to all of its neighbors v £ M{u) at each iteration. We also note that solving the second 
step for computing yu,v{t) can be done efficiently since it is solving a quadratic equation whose 
coefficients are decided by yv{t) and yuit)- We establish the following running time of the 
algorithm. 

Theorem 2 Algorithm A terminates in 2'-^^'^^n'^e~'^log^{ne~^) iterations as long as -0* = 0(1). 

The proof of Theorem 2 is presented in the following section. Note that the algorithm may 
require to maintain irrational messages or rational messages of long bits. In Section 3.2, we 
present a minor modification of the algorithm to fix the issue, which leads to a fully poly-time 
approximation algorithm (FPTAS) to compute an approximate BP fixed point. 



3.1 Proof of Theorem 2 

We first define F* on (0, 1)": for y = G (0, 1)" 

F*{y) =F{y,yE), 

where F is the (original) Bethe free energy function defined in (5) and the additional vector 
= [yu,v] £ (0, 1)''^' is defined as the solution satisfying 

^ 1^ yn-j/n^ . yv^JJuA = and Vu,. < min{y„, 2/4. (8) 

V ^ yu yv + yu,v yu,v j 

Observe that each yu,v is a function of yu,yv, i-e. yu,v = yu,v{yu,yv)- One can check that the 
gradient of F* has the same form with that of F as follows. 



dyv yv frf. , v "^-yv yv- yu,v 

where we recall that yu,v is decided in terms of y^yv from (8). This implies that the updating 
procedure of y{t) in the algorithm is simply as 



y(* + i) 



y(t) + _LvF*(y(t)) 



(9) 



Based on this interpretation, we start to prove the running time of the algorithm by stating 
the following key lemma. 



Lemma 3 Define 5 > as the largest real number that satisfies the followings. 

^ < ^^Txx^ and 4(51n4: < 1- 

- 2(A + + 1 26 - 

Then, 

y(t) ^D:=[6,l- Sf, yt>t,:= 6'^ 
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Proof. First observe that y(t*) G D due to our choice of projection [•]* and -fjj = S. Hence, it 
suffices to estabhsh the following three steps: for all v £ V and t > t*, 

dF* 

-T— < if > 1 - 2(5 and y G (10) 

dyv 

dF* 

> if < 2(5 and y e D, (11) 



1 

Vt 



dy 
OF* 



V 



< 2 if y e ^- (12) 



dyv 

From (9), (10), (11) and (12), it clearly follows that y{t) G D for ah t > U. 

Proof of (10). We first provide a proof of (10). To this end, if y G (0, 1)", we have 

^-yv-yu + yu,v Vv _ i yv 

1 ~ y« Vv ~ yu,v 1 + -j — of^i^o, — ~ yu,v 

1 f^^ yu,v 



1 + e-'/'*"'''' • V yv-yu,v 

< max{l,e'^'""'} 

< i^t, (13) 
where we use the definition (8) of yu,v Using this, (10) follows as 

^ = ^.M iin ^-y- I y ^^J ^-yv-yu + yu,v yv \ 
dyv yv '^-yv yv-yu,vj 

< ln(2(A + l)^,)+ln7^ + AlnV,^ 

1 — 2(5 

J_ _ 1 

2(5 ^ 

< 0, 

1 /2 

where the last inequality is from our choice of (5 < — — ^ ,4a i -i — . 

— 2(A+i)V': + +1 



Proof of (11). Second, we provide a proof of (11). Similarly as we did in (13), we have 

i-yv-yu + yu,v yv . . r, ^(^:^)\ .1 

> mmn,e^ \ > (14) 

i-yv yv- yu,v ^ w 



Hence, (11) follows as 



OF* ^.(.) I 1 - I ^ ^^f^-yv-yu + yu,v yv 



dyv yv , V ^-yv yv- yu,v 

> -ln(2(A + l)V*) + lni^ + Aln^ 
= In. ^"^ 



2{A + l)^t'^+' 
> 0, 
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where the last inequahty is again from our choice of 5 < 



1/2, 

2{A+l)</)J'^+i+l' 



Proof of (12). Finally, we provide a proof of (12). Using (13) and (14), it follows as 



dF* 



dyv 



{v) 



+ 



In 



1 - 2/« 



Vv 



In 



1 Uv Uu ~\~ yu,v 



Vv 



Vv yu,v 



-I or 

< ln(2(A + l)V'*)+ln^^ + AlnV'^ 



26 



2 In 



1-25 
26 



6 ^ 

where the last equality is from our choices of 5, which imply 
This completes the proof of Lemma 3. 



□ 



Using the above lemma, we will show that the algorithm terminates in 2'-'^^hT?e~'^ log^(ne~^) 
iterations until it outputs an e-approximate BP fixed point. We first explain why it suffices to 
show the following: 

2'^(^)nlogr 



5^Q-||VF*(y(t))|| 



t=U 



(15) 



where q = =77^ — tth- '^^^ above equality suggests that we can choose T = 2°^^'^ri^e-^ log^(n£-^) 

Z^t=t* * 



such that 



From YM-=t ~ there exists t G [t*,T] such that ||VF*(y(t))||2 < e/6. Further, observe that 
if yE{i) is defined from (8), 

l|VF(y(t),y^,(t))||2 < ||VF*(y(t))||2 < e/6. 

Then, Lemma 1 implies that the computed messages at the t-th iteration is an e-approximate 
BP fixed point. 

Now we proceed toward establishing the desired inequality (15). The important implication 
of Lemma 3 is that the algorithm does not need the projection [-J^, after the t*-th iteration. In 
other words, from Lemma 3 and (9), we have that 

y{t + l) = y(t) + -LvF*(y(t)), yt>U. 



Y.Cf\\VF*{ym\l< 



t=u 



eN2 
6/ 



In what follows, we will assume t >t^ and y{t) £ D from Lemma 3. 
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Using the Taylor's expansion, we have 
F*(y(t + 1)) = F*(y{t) + l=VF*iy{t)) 



F*{y{t)) + VF*{y{t)y • -L VF*{y{t)) + i-L VF*{y{t)y ■ R ■ 1= VF*{y{t)), 

(16) 



where R is a n x n matrix such that 



\Rvw\ < sup 

ye-B 



d'^F* 



dyvdyu 

and i? is a Loo-ball in centered at y(t) G D with its radius 



r = max 



I dF\ , 



From (12), we know r < |. Hence, y,; S 1 — 5/2] for y = [y^] G B. Using this with 

, one can check that 

Q2p* 



sup 

yes 



20(A) if u = (^;,tt;) G ^ 

otherwise 



dyvdy, 

Therefore, using these bounds the equality (16) reduces to 

F*(y(t + 1)) > F*(y(t)) + i=||VF*(y(t))||i-^i^!^P^ 



1 

7t 



> F*{y{t)) + — \\VF*{yml 



O (20(A)n) 



t 



since < A • n. If we sum the above inequality over t from to T — 1, we have 
F*{y{T)) > F*(y(i,)) + xi^l|VF*(y(t))||i-0(2«(^)n) Xiy- 



t 



Since |F*(y)| = 0{An) for y G D, we obtain 



T-l 



T-1 



5^i=||VF*(y(t))||2 < 0(An) + O (2«(^)n) J] i. 



Thus, we finally obtain the desired conclusion (15) as follows. 



ct ■ ||VF*(y(t))||i = -J-^ i= ||VF*(y(t))||i 



< 



1^ (oiAn) + O (2«(^)n) f; 



20{^)n\ogT 



Vf ' 

where we recall that = (5~^ = 2'^(^). This completes the proof of Theorem 2. 
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3.2 Modification to FPTAS 



In this section, we provide a minor modification of Algoritlim A in Section 3, to establish a fully 
polynomial-time approximation scheme (FPTAS) for the BP fixed point computation. We will 
show only a polynomial number of bits are enough to maintain for each message yv{t)- To this 
end, we define the following function = [g^] which describe the updating rule (at the t-th 
iteration) of Algorithm A, i.e. 

y{t + 1) = (7*(y(t)) under Algorithm A. 

Now we formally propose the following algorithm of minor modification. 
Algorithm B 



1. Algorithm parameters: 

e G (0, 1) and z{t) = [z^{t) e (0,1) : v £ V] at the t-th iteration, 
where Zy{t) has /c-bits (i.e. 2''z„(t) G Z) for all t >0, v £ V. 

2. z{t) is updated as: 

z,{t + 1) - gi{z{t)) 

3. Compute a set of messages {ttIj^^^j,, ?7ij,_^u} satisfying 

^ _ tpu,viO, 1) 1 - Zy{t) - Zujt) + Zu^vjt) Z^{t) 

'i/'«,i>(0,0) 1-Z„(t) Zy{t) - Zu,y{ty 

where Zu^vif) > is computed to satisfy 

\l - Zu{t) - Zy{t) + Zu,v{t) Zu,v{t) J 

4. Terminate if is an e-approximate BP fixed point. 



We note that each step in the above algorithm is executable in a polynomial number of bitwise 
operations with respect to A, 1/e, k and n. The second step to compute gl, consists of 0(A) 
arithmetic operations, logarithm, division, addition, square root and multiplication. Further- 
more, the equations in the third step to compute {mu^v,^v^u} can be solvable in a polynomial 
number of bitwise operations with respect to 1/e and k. 

Now we state the following theorem, which shows that one can choose k as a polynomial in 
terms of n, 1/e and 2'^. This implies that Algorithm B is a FPTAS for such a choice of k as 
long as A = O(logn). We note that one can obtain the explicit bound of k in terms of A, •0*, 
n and e via explicitly calculating each step in our proof. 

'^Another naive way to avoid such an explicit choice of k is to run Algorithm B 'polynomially' many times by 
increasing k (as well as the number of iterations) until it succeeds. 



< 
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Theorem 4 For ip^, = 0(1), there exists a k = 2^^^^n?£ ^log^(ne ^) such that Algorithm B 
terminates in log^(ne ^) iterations. 

Proof. To begin with, one can check that (7* is 2'-^(^^-Lipschitz in [(5/2,1 — 5/2]^ where 5 = 
■Yl20{^) jg ^Y^Q constant in Lemma 3. Formally speaking, for all t > 0, yi,y2 G [5/2, 1 — 5/2]^, 

Il9*(yi) -5*(y2)||i < ^- ||yi -y2||i, 

where L = 2*-^^^^ Let y{t) and z(i) are parameters of Algorithm A and B, respectively. Initially, 
suppose y(0) = z(0). Then, if y{t),z{t) G [5/2, 1 - (5/2]'", 

||y(t + l)-z(t + l)||i < \\g\y{t)) - g\zm\i + ^ 

< L.||y(t)-z(t)||i + ^ 

= h{\\y{t)-m\i). (17) 

where we define h{x) := L ■ x + From Lemma 3, we know y(t) € — 5]"' for all t > 0. 
Thus, (17) holds for t > with /iW(0) < S/2.^ Further, using /iW(0) < (L + 1)* • n/2'', it follows 
that 

log {2^^/n] 

||y(t) - z(i)||i < 7, for t < T := / , (18) 

log(L + 1) 

where 7 < 5/2 will be decided later. 

Now it is not hard to see that fort < T (i.e. y(t),z(t) G [5/2, 1-5 /2f and ||y(t)-z(t)||i < 7), 
if ||VF*(y(i))||oo < 7, then ||VF*(z(t))||oo <7-2<^(^). Observe that one can choose 7 = e/2'^(^) 
and k = 2^(^)^27-4 log^(n7 ^) so that Algorithm A has ||VF*(y(t))||oo < 7 in T iterations, 
and hence Algorithm B has ||V-F*(z(t))||oo < e/6 in the same number of iterations. Therefore, 
the conclusion of Theorem 4 follows from Lemma 1. □ 



4 Extension to Non-binary Graphical Models 

In this section, we discuss how to design a similar algorithm to those in the previous section for 
non-binary graphical models. Here we provide a high-level description, but one can check the 
further details based on the identical arguments to the binary case in Section 3. 

Consider non-binary random variables {x^,} in the graphical model described in Section 2, 
i.e., Xv G [Q] = {0, 1, . . . , Q} for some G M. Hence, the potential functions tpu^v and tpy are 
functions on [Q]'^ and [Q], respectively. We remind the reader that the essential goal is to find a 
near stationary point of the following Bethe approximation under the constraints ^^.^ t^(x^) = 1 
and Tj;(x,,) — ^I2xu '^^''"^'^^^ ''^^'^ ' 

^ ^r.„(xt,) [lnV'^(x„) - lnTi,(x„)] + T«,t;(a;„, x„) 

V£V Xv {u,v}&E Xu,Xv 

First, for every pair {p,q) G [Q]^, one can define F*^^ on y = [y^,] G [0, 1]" similar to F* in 
Section 3.1, by (a) fixing variables except for {t^(p), t^((/) w^V} and {tu^v{p-,<i) '■ iu,v) G E}, 

^/i'*' is the function composing h H times', i.e. /i'*-* — ho /i'*"^-* and h^^^ = h. 



Ill '4^U,V i^U ) ) 



In 



Tu,v {^uj ^v) 
Tu{Xu)t^(^Xi,^ 
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(b) setting = Ty{p), and (c) considering the constraints Ylxv'^^^^^^ ~ ^ Tvixy) = 
; ^f)- Now the algorithm for a non-binary graphical model maintains variables 

{T^{xy) : V £ V, Xy £ [Q]} and {r* x^,) : (u, u) e (x^, x„) G [Q]^} , 

at the t-th iteration. At each round, it pick a pair {p,q) € [Q]'^ and updates the vector {t*(p)| 
as 

rl-^Hp) ^ T*(p) + -^VF;,, ({r*(p) : ^ G y}) , 

for some constant a > 0. {r*((?) : v £ V} and {t* ,„(p, : {u,v) G E'} can be also updated 
according to |r*(p) : -y G so that they satisfy the constraints Ty^x^) = 1 and Ty{xy) = 
'^u,vixu, Xv) as well as quadratic equations corresponding to (8) of the binary case. From sim- 
ilar arguments in the proof of Lemma 3, one can show that it is possible to choose a small enough 
so that updated marginal probabilities {^''''-'^(p), r*"'"-^(g') : v G and {t*+/(p, (7) : {u,v) G i?} 
are always valid (i.e., positive). And hence, the convergence rate analysis follows as we did in 
Section 3.1. We omit the further details in this paper. 

5 Conclusion 

In the last decade, exciting progresses have been made on understanding computationally hard 
problems in computer science using a variety of methods in statistical physics. The belief prop- 
agation (BP) algorithm or its variants are on this line and suggest to solve certain 'relaxations' 
of hard problems. In this paper, we address the question whether the relaxation is indeed com- 
putationally easy to solve in a strong sense. We believe that our rigorous complexity analysis of 
the BP-relaxation is the important step to guarantee the complexity of BP-based algorithms. 
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A Proof of Lemma 1 

To simplify notations, we use a = b ■ e^^ to mean b ■ < a < b ■ . Now from 
< e, we have 

ipui'^) 'ipu,vil,0) l-yv-yu + yu,v 



dF 



dF 



dyu,v 



Vu yu,v 

^/)(".") 2/u ~ yu,v yv ~ yu,v 



w£j\f(u)\v 



1 Vu yv ~l~ yu,v yu,v 

Using the above inequalities, the desired conclusion of Lemma 1 follows as 

1 yv yu ~i~ yu,v 





.(0,1) 


'>Pu 


.(0,0) 




.(0,1) 




.(0,0) 




.(0,1) 


i'u 


.(0,0) 




.(0,1) 




.(0,0) 



1 - y. 



1 + 



yv,u 
yv~yu,v 



1 + 



yu yu,v 



^~yv-yu+yu,v 



1 + e • 



yv 

yv yu,v 



Vu yu.v 
'i--yu~yv+yu,v 



1 + 



y 



u yu.v 



yu~\~yu,v 

1 _|_ p±2e pt/>("'") ipujl) i'u,v{l,0) T-r 



1 + e 



V'«,.(0, 1) 




1 + 



i'u{o) ' ^u,v{o,o) 



wej\f{u)\v 



(1 ± 6e) • fu^y 



\weN{u)\v 



< E and 
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